| name | size | modified_date | id |
|---|---|---|---|
| Preprocess.R | 2.6000e+03 | 06/02/2024 10:43 PM | syn60236613 |
| Sample_annotation.csv | 8.9500e+04 | 05/30/2024 8:37 AM | syn60157686 |
| Probe_array.csv | 7.3000e+07 | 05/30/2024 9:04 AM | syn60157718 |
| Probe_annotation.csv | 5.6870e+08 | 05/30/2024 8:49 AM | syn60157694 |
| DetectionP_subchallenge1.csv | 2.7340e+09 | 05/24/2024 5:35 AM | syn59870646 |
| DetectionP_subchallenge2.csv | 4.9870e+09 | 05/24/2024 5:51 AM | syn59872208 |
| Beta_raw_subchallenge1.csv.gz | 5.8690e+09 | 05/24/2024 5:19 AM | syn59868755 |
| Beta_raw_subchallenge2.csv.gz | 1.1008e+10 | 05/24/2024 2:19 PM | syn59898399 |
| GEO Number | N |
|---|---|
| GSE128827 | 5 |
| GSE228149 | 5 |
| GSE200659 | 11 |
| E_MTAB_9312 | 13 |
| GSE74738 | 13 |
| GSE108567 | 16 |
| GSE75196 | 24 |
| GSE115508 | 25 |
| GSE98224 | 48 |
| GSE69502 | 52 |
| GSE204977 | 55 |
| GSE169598 | 64 |
| GSE100197 | 95 |
| GSE232778 | 187 |
| GSE144129 | 210 |
| GSE167885 | 242 |
| GSE75248 | 334 |
| GSE71678 | 343 |
| V1 | V2 |
|---|---|
| anencephaly | diandric_triploid |
| anencephaly | hellp |
| anencephaly | ivf |
| anencephaly | lga |
| anencephaly | miscarriage |
| anencephaly | spina_bifida |
| anencephaly | subfertility |
| diandric_triploid | chorioamnionitis |
| diandric_triploid | ivf |
| diandric_triploid | lga |
| diandric_triploid | sga |
| diandric_triploid | subfertility |
| fgr | anencephaly |
| fgr | chorioamnionitis |
| fgr | ivf |
| fgr | spina_bifida |
| fgr | subfertility |
| hellp | chorioamnionitis |
| ivf | chorioamnionitis |
| ivf | hellp |
| ivf | subfertility |
| miscarriage | chorioamnionitis |
| miscarriage | ivf |
| miscarriage | lga |
| miscarriage | sga |
| miscarriage | subfertility |
| ms_ivf | anencephaly |
| ms_ivf | diandric_triploid |
| ms_ivf | ivf |
| ms_ivf | miscarriage |
| ms_ivf | spina_bifida |
| ms_ivf | subfertility |
| ms_subfertility | hellp |
| ms_subfertility | ivf |
| ms_subfertility | subfertility |
| pe | anencephaly |
| pe | diandric_triploid |
| pe | hellp |
| pe | pe_onset |
| pe | spina_bifida |
| pe_onset | anencephaly |
| pe_onset | chorioamnionitis |
| pe_onset | diandric_triploid |
| pe_onset | hellp |
| pe_onset | ivf |
| pe_onset | miscarriage |
| pe_onset | spina_bifida |
| pe_onset | subfertility |
| preterm | diandric_triploid |
| preterm | ivf |
| preterm | miscarriage |
| preterm | subfertility |
| sga | hellp |
| sga | ivf |
| sga | lga |
| sga | subfertility |
| spina_bifida | diandric_triploid |
| spina_bifida | hellp |
| spina_bifida | ivf |
| spina_bifida | miscarriage |
| spina_bifida | subfertility |
| subfertility | chorioamnionitis |
| subfertility | hellp |
## $fgr
##
## $pe
##
## $pe_onset
##
## $preterm
##
## $anencephaly
##
## $spina_bifida
##
## $gdm
##
## $diandric_triploid
##
## $miscarriage
##
## $lga
##
## $subfertility
##
## $hellp
##
## $chorioamnionitis
## $ivf
##
## $subfertility
Normal-GA model was trained using samples without 12 of 13 available conditions. They were significantly correlated to GA: (1) fetal growth restriction (FGR); (2) PE; (3) PE onset (early/late/not applicable); (4) hemolysis, elevated liver enzyme, and low platelet (HELLP) syndrome; (5) anencephaly; (6) spina bifida; (7) diandric triploid; (8) miscarriage; (9) preterm delivery; (10) gestational diabetes mellitus (GDM); (11) large-for-gestational-age (LGA) infant; (12) subfertility; and (13) chorioamnionitis. We excluded preterm delivery because it was related to the outcome, i.e., GA, simply by definition.
Res-CPG-GA model consisted of two models for <37 and ≥37 weeks’ gestation estimated by normal-GA model. The model numbers and periods were determined according to clinical knowledge and pursuing normal distribution of residual GA. Pregnancy termination <37 weeks’ gestation is presumably related to a medical indication, while during term pregnancy, both medical and non-medical indications might be encountered. Fitting residuals in term pregnancy using beta values might lead to overfitting. Nonetheless, we used two approaches, i.e., predicting residual GA during: (1) <37 weeks’ gestation only (Res-CPG-PR-GA); and (2) both <37 and ≥37 weeks’ gestation (Res-CPG-GA). The latter model training was restricted to samples with absolute residual GA >0.05. It was chosen based on visual judgement of quantile-to-quantile (QQ) plot to pursue linearity, hence, easily fitted by elastic net regression. We assumed that such approach would avoid overfitting on predicted GA which might be already well-estimated by normal-GA model. We tested this assumption by training Res-CPG model without the residual-GA restriction (Res-CPG-rev-GA).
Res-Conds-GA model was similar to Resfull-GA model but we used predictors of multiplication values for each predicted probability and residual GA estimated by a model for the corresponding condition. Specifically, we trained a model using beta values of DMPs among samples with a condition. The rationale was that the conditions have different trajectories of when pregnancies are terminated and each pregnant woman has a different set of probabilities of the conditions. For comparison purpose, we used the true and predicted probabilities of the conditions, either from the prediction (Resfull-GA) or imputation models (Res-GA) for the conditions, and without the multiplication procedure (all the probabilities equal to 1). This comparison tested the importance of phenotypic prediction performances in the parent model accuracy and robustness.
Res-Comb- and Res-CPG-Comb-GA models were considered because other conditions might affect pregnancy termination, not limited to the 12 conditions. The stacking order was considered different. In Res-Comb-GA model, we limited the degree of freedom of residual fitting using known phenotype information (Res-Conds-GA), thus, the second model only fitted the unexplained residual GA (Res-CPG-GA), simply to boost the prediction. Meanwhile, Res-CPG-GA model was assumed to generally fit both explained and unexplained residual GA. Since the resulting residual was presumably smaller, the overall prediction was less affected by the imperfect accuracies of Res-Conds-GA. Both Res-Comb- and Res-CPG-Comb-GA models consisted of three models for <37, ≥37 and ≤40, and >40 weeks’ gestation estimated by normal-GA model. The model numbers and periods were also determined according to clinical knowledge and pursuing normal distribution of residual GA. The estimated delivery date falls on 40 week’s gestation. Before this date, a pregnant woman might seek termination in advance due to a medical condition. Meanwhile, a normal pregnant woman might seek for termination since the delivery date. We used three approaches, i.e., predicting residual GA during: (1) <37 weeks’ gestation only (Res-Comb-PR-GA); (2) both <37 and ≥37 and ≤40 weeks’ gestation, i.e., term before the estimated delivery date (Res-Comb-PRTB-GA); and (3) all the three periods (Res-Comb-GA).
We considered the first-iteration models were underperformed based on the validation set compared to other participants’ model using the 450k probes (Figure 2). In training set, we also observed the accuracies of those models inconsistently won against the top performer across different phenotypic subgroups (Figure 3). Hence, the phenotype information might be not as useful as expected. Nevertheless, Resfull-GA model performance was slightly more consistent than Res-, Res-CR-, and Res-Seq-GA models although their performances were better using whole training set. In the first iteration, we learned that the accuracies in predicting the conditions matter, since Resfull-GA employed the larger-training-size, prediction models instead of smaller-training-size, imputation models. The prediction model that used the true probabilities were underperformed. However, we argue that it was because the degree of freedom was lacking due to binary/dichotomic values (0/1). This finding also indicated the predicted probabilities were also lacking for the degree of freedom.
Res-CPG-GA model family mostly outperformed the first-iteration models in terms of the rank in the validation set, which jumped up to top-3 positions based on RMSE, and the subgroup-wise winning rates (Figure 2). Res-CPG- and Res-CPG-rev-GA models for <37 and ≥37 weeks’ gestation consistently outperformed Res-CPG-PR-GA model across the subgroups (Figure 3), although it coincidentally outperformed Res-CPG- and Res-CPG-rev-GA models in the validation set (Figure 2). This finding opposed our assumption on the overfitting potential (see Modeling strategies). We were inspired to improve the degree of freedom of phenotypic predicted probabilities, resulting in Res-Conds-GA model.
Overall, Res-Conds-GA model family did not outperform Res-CPG and Res-CPG-rev models (Figure 2). The winning rates across the subgroups between both model families were also similar, but Res-Conds-GA models finally won in a few subgroups in which the previous models consistently lose, e.g., diandric triploid and miscarriage (Figure 3). Oppositely, Res-Conds-GA models lose in 3 of the 16 datasets of origin, compared to only 1 dataset (GSE74738) for Res-CPG and Res-CPG-rev models. Res-Conds-GA models using the true and predicted probabilities were more accurate and robust than those using imputation models or without the multiplication (Table 1). In the validation set, the Res-Conds-GA using the predicted probabilities had higher RMSE than those of Res-CPG-rev, Res-CPG, and Res-CPG-PR, consecutively, but not the top performer.
Eventually, we were inspired to combine Res-CPG- and Res-Conds-GA modeling approaches. Res-Comb-GA model won in almost all subgroups, except 1 dataset of origin (GSE228149) (Figure 4), and was almost the same RMSE with the top performer in the validation set (1.077 vs. 1.076) (Table 1). Both r values were 0.966, but MAE of Res-Comb-GA was slightly higher than the top performer (0.888 vs. 0.863). Meanwhile, Res-CPG-Comb-GA model had the poorest performance compared to all of the other multistage prediction models (Figure 3). This finding underlines the importance of phenotypic-related information, although it was not perfectly accurate in predicting the conditions. Therefore, considering the robustness across subgroups, Res-Comb-GA model has a higher chance to win in the test set, particularly in sub-challenge 1. Hence, we stopped the iterative process in model development.
| model | metric | avg | lb | ub | current_best | win | sub | code | rank | val |
|---|---|---|---|---|---|---|---|---|---|---|
| Normal-GA | RMSE | 1.859 | 1.851 | 1.866 | 1.076 | No | ||||
| Normal-GA | MAE | 1.142 | 1.137 | 1.148 | 0.863 | No | ||||
| Normal-GA | r | 0.967 | 0.967 | 0.968 | 0.966 | Yes | ||||
| Res-GA (true) | RMSE | 1.276 | 1.272 | 1.281 | 1.076 | No | ||||
| Res-GA (true) | MAE | 0.868 | 0.866 | 0.871 | 0.863 | No | ||||
| Res-GA (true) | r | 0.981 | 0.981 | 0.981 | 0.966 | Yes | ||||
| Res-GA (true)* | RMSE | 1.172 | 1.168 | 1.176 | 1.076 | No | ||||
| Res-GA (true)* | MAE | 0.782 | 0.780 | 0.785 | 0.863 | Yes | ||||
| Res-GA (true)* | r | 0.984 | 0.984 | 0.984 | 0.966 | Yes | ||||
| Res-GA | RMSE | 1.521 | 1.515 | 1.527 | 1.076 | No | ||||
| Res-GA | MAE | 1.027 | 1.024 | 1.030 | 0.863 | No | ||||
| Res-GA | r | 0.973 | 0.973 | 0.973 | 0.966 | Yes | ||||
| Res-GA* | RMSE | 0.775 | 0.772 | 0.777 | 1.076 | Yes | 3 | clearcut | 8 | 1.4369 |
| Res-GA* | MAE | 0.478 | 0.476 | 0.479 | 0.863 | Yes | 3 | clearcut | 8 | 1.132 |
| Res-GA* | r | 0.993 | 0.993 | 0.993 | 0.966 | Yes | 3 | clearcut | 8 | 0.9416 |
| Res-CR-GA | RMSE | 1.537 | 1.531 | 1.543 | 1.076 | No | ||||
| Res-CR-GA | MAE | 1.026 | 1.023 | 1.029 | 0.863 | No | ||||
| Res-CR-GA | r | 0.972 | 0.972 | 0.973 | 0.966 | Yes | ||||
| Res-CR-GA* | RMSE | 0.782 | 0.780 | 0.784 | 1.076 | Yes | 1 | testthewaters | 7 | 1.3706 |
| Res-CR-GA* | MAE | 0.488 | 0.486 | 0.489 | 0.863 | Yes | 1 | testthewaters | 7 | 1.0735 |
| Res-CR-GA* | r | 0.993 | 0.993 | 0.993 | 0.966 | Yes | 1 | testthewaters | 7 | 0.9476 |
| Resfull-GA | RMSE | 1.329 | 1.324 | 1.334 | 1.076 | No | ||||
| Resfull-GA | MAE | 0.909 | 0.906 | 0.911 | 0.863 | No | ||||
| Resfull-GA | r | 0.979 | 0.979 | 0.980 | 0.966 | Yes | ||||
| Resfull-GA* | RMSE | 0.946 | 0.941 | 0.950 | 1.076 | Yes | 2 | isitthedarkhorse | 6 | 1.3552 |
| Resfull-GA* | MAE | 0.613 | 0.611 | 0.615 | 0.863 | Yes | 2 | isitthedarkhorse | 6 | 1.073 |
| Resfull-GA* | r | 0.990 | 0.990 | 0.990 | 0.966 | Yes | 2 | isitthedarkhorse | 6 | 0.9505 |
| Res-Seq-GA | RMSE | 1.642 | 1.634 | 1.650 | 1.076 | No | ||||
| Res-Seq-GA | MAE | 1.082 | 1.078 | 1.085 | 0.863 | No | ||||
| Res-Seq-GA | r | 0.969 | 0.968 | 0.969 | 0.966 | Yes | ||||
| Res-Seq-GA* | RMSE | 1.198 | 1.194 | 1.201 | 1.076 | No | ||||
| Res-Seq-GA* | MAE | 0.844 | 0.842 | 0.846 | 0.863 | Yes | ||||
| Res-Seq-GA* | r | 0.986 | 0.986 | 0.986 | 0.966 | Yes | ||||
| Res-CPG-PR-GA | RMSE | 1.069 | 1.062 | 1.076 | 1.076 | No | 4 | stepback | 3 | 1.1381 |
| Res-CPG-PR-GA | MAE | 0.562 | 0.559 | 0.564 | 0.863 | Yes | 4 | stepback | 3 | 0.9149 |
| Res-CPG-PR-GA | r | 0.987 | 0.987 | 0.987 | 0.966 | Yes | 4 | stepback | 3 | 0.9625 |
| Res-CPG-GA | RMSE | 0.555 | 0.552 | 0.558 | 1.076 | Yes | 5 | stepbackfurther | 5 | 1.2025 |
| Res-CPG-GA | MAE | 0.271 | 0.270 | 0.272 | 0.863 | Yes | 5 | stepbackfurther | 5 | 0.9549 |
| Res-CPG-GA | r | 0.996 | 0.996 | 0.996 | 0.966 | Yes | 5 | stepbackfurther | 5 | 0.9645 |
| Res-CPG-rev-GA | RMSE | 0.512 | 0.506 | 0.519 | 1.076 | Yes | 6 | stepbackabit | 4 | 1.1867 |
| Res-CPG-rev-GA | MAE | 0.183 | 0.182 | 0.185 | 0.863 | Yes | 6 | stepbackabit | 4 | 0.9363 |
| Res-CPG-rev-GA | r | 0.997 | 0.997 | 0.997 | 0.966 | Yes | 6 | stepbackabit | 4 | 0.963 |
| Res-Conds-GA†| RMSE | 0.686 | 0.684 | 0.688 | 1.076 | Yes | ||||
| Res-Conds-GA†| MAE | 0.446 | 0.445 | 0.447 | 0.863 | Yes | ||||
| Res-Conds-GA†| r | 0.995 | 0.995 | 0.995 | 0.966 | Yes | ||||
| Res-Conds-GA*‡ | RMSE | 0.810 | 0.808 | 0.813 | 1.076 | Yes | ||||
| Res-Conds-GA*‡ | MAE | 0.561 | 0.559 | 0.563 | 0.863 | Yes | ||||
| Res-Conds-GA*‡ | r | 0.992 | 0.992 | 0.992 | 0.966 | Yes | ||||
| Res-Conds-GA§ | RMSE | 0.819 | 0.817 | 0.822 | 1.076 | Yes | 7 | slidingdoors | 2 | 1.0949 |
| Res-Conds-GA§ | MAE | 0.551 | 0.550 | 0.553 | 0.863 | Yes | 7 | slidingdoors | 2 | 0.8843 |
| Res-Conds-GA§ | r | 0.992 | 0.992 | 0.992 | 0.966 | Yes | 7 | slidingdoors | 2 | 0.9642 |
| Res-Conds-GA¶ | RMSE | 0.903 | 0.901 | 0.905 | 1.076 | Yes | ||||
| Res-Conds-GA¶ | MAE | 0.655 | 0.653 | 0.657 | 0.863 | Yes | ||||
| Res-Conds-GA¶ | r | 0.991 | 0.990 | 0.991 | 0.966 | Yes | ||||
| Res-Comb-PR-GA | RMSE | 0.724 | 0.721 | 0.726 | 1.076 | Yes | ||||
| Res-Comb-PR-GA | MAE | 0.496 | 0.495 | 0.497 | 0.863 | Yes | ||||
| Res-Comb-PR-GA | r | 0.994 | 0.994 | 0.994 | 0.966 | Yes | ||||
| Res-Comb-PRTB-GA | RMSE | 0.604 | 0.602 | 0.605 | 1.076 | Yes | ||||
| Res-Comb-PRTB-GA | MAE | 0.425 | 0.424 | 0.426 | 0.863 | Yes | ||||
| Res-Comb-PRTB-GA | r | 0.996 | 0.996 | 0.996 | 0.966 | Yes | ||||
| Res-Comb-GA | RMSE | 0.568 | 0.566 | 0.569 | 1.076 | Yes | 8 | pointofdivergence | 1 | 1.0772 |
| Res-Comb-GA | MAE | 0.389 | 0.388 | 0.390 | 0.863 | Yes | 8 | pointofdivergence | 1 | 0.8876 |
| Res-Comb-GA | r | 0.996 | 0.996 | 0.996 | 0.966 | Yes | 8 | pointofdivergence | 1 | 0.9663 |
| Res-CPG-Comb-GA†| RMSE | 1.777 | 1.769 | 1.784 | 1.076 | No | ||||
| Res-CPG-Comb-GA†| MAE | 1.116 | 1.110 | 1.121 | 0.863 | No | ||||
| Res-CPG-Comb-GA†| r | 0.971 | 0.971 | 0.971 | 0.966 | Yes | ||||
| Res-CPG-Comb-GA‡ | RMSE | 1.818 | 1.810 | 1.825 | 1.076 | No | ||||
| Res-CPG-Comb-GA‡ | MAE | 1.152 | 1.147 | 1.158 | 0.863 | No | ||||
| Res-CPG-Comb-GA‡ | r | 0.969 | 0.969 | 0.969 | 0.966 | Yes | ||||
| Res-CPG-Comb-GA§ | RMSE | 1.828 | 1.821 | 1.836 | 1.076 | No | ||||
| Res-CPG-Comb-GA§ | MAE | 1.151 | 1.145 | 1.156 | 0.863 | No | ||||
| Res-CPG-Comb-GA§ | r | 0.969 | 0.968 | 0.969 | 0.966 | Yes | ||||
| Res-CPG-Comb-GA¶ | RMSE | 1.837 | 1.830 | 1.844 | 1.076 | No | ||||
| Res-CPG-Comb-GA¶ | MAE | 1.171 | 1.166 | 1.177 | 0.863 | No | ||||
| Res-CPG-Comb-GA¶ | r | 0.968 | 0.968 | 0.969 | 0.966 | Yes |
## $`Normal-GA`
##
## $`Res-CPG-rev-GA`
##
## $`Res-Conds-GA§`
##
## $`Res-Comb-GA`